Lexicons for Human Language Technology
نویسنده
چکیده
Information about words--their pronunciation, syntax and meaning--is a crucial and costly part of human language technology. Many questions remain about the best way to express and use such lexical information. Nevertheless, much of this information is common to all current approaches, and therefore the effort to collect it can usefully be shared. The Linguistic Data Consortium (LDC) has undertaken to provide such common lexical information for the community of HLT researchers. The purpose of this paper is to sketch the various LDC lexical projects now underway or planned, and to solicit feedback from the community of HLT researchers.
منابع مشابه
SIMPLE: A General Framework for the Development of Multilingual Lexicons
The project LE-SIMPLE is an innovative attempt of building harmonized syntactic-semantic lexicons for 12 European languages, aimed at use in different Human Language Technology applications. SIMPLE provides a general design model for the encoding of a large amount of semantic information, spanning from ontological typing, to argument structure and terminology. SIMPLE thus provides a general fra...
متن کاملMHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملThe EAGLES/ISLE initiative for setting standards: the Computational Lexicon Working Group for Multilingual Lexicons
ISLE (International Standards for Language Engineering), a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme, is a continuation of the long standing EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, and is carried out by European and American groups within the EU-US International Research Co-operation, supported by EC and...
متن کاملEfficient Development of Lexical Language Resources and their Representation
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexico...
متن کاملSession 1: Lexicons, Corpora, and Evaluation
Our technologies for collecting, storing, and disseminating vast amounts of information have gotten ahead of our technologies for collating and analyzing it, and that situation has posed a serious challenge for human language technology. As a consequence, natural language processing has been moving rapidly towards large-scale systems addressed to real tasks. Demos that won't scale up are no lon...
متن کاملlex4all: A language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition
This paper describes lex4all, an opensource PC application for the generation and evaluation of pronunciation lexicons in any language. With just a few minutes of recorded audio and no expert knowledge of linguistics or speech technology, individuals or organizations seeking to create speech-driven applications in lowresource languages can build lexicons enabling the recognition of small vocabu...
متن کامل